Context
In this project, the objective is to develop a predictive model for corporate bankruptcy, utilizing historical financial and default data. Companies may encounter various forms of financial distress, such as missed payments, distressed exchanges, or formal bankruptcy proceedings like Chapter 7 and Chapter 11 filings. By analyzing these events and their associated financial indicators, this project aims to build a dataset that captures each company’s fiscal health over time and leverages machine learning to forecast the likelihood of default.
The work involves combining multiple datasets—Compustat, LoPucki, and Moody’s—which each track different aspects of corporate financial distress. After careful preprocessing to align and clean the data, the most recent financial and default indicators for each company are extracted. These serve as the target variable \(Y\) for bankruptcy risk, which will later be matched with a comprehensive feature set \(X\) to train the predictive model. This analysis aims to provide insights into the patterns that precede default and to contribute a robust tool for assessing bankruptcy risk in real-world scenarios.
Building our dataset
In this section we will focus on building our target variable \(Y\) for bankruptcy prediction and our feature set \(X\) for training the predictive model.
Building Y - Bankruptcy Data
To build our target variable we will go through the following steps:
- Extracting the relevant data from the Compustat, LoPucki, and Moody’s datasets.
- Merging these datasets to create a comprehensive dataset that captures the financial health and default history of each company.
- Creating the target variable \(Y\) based on bankruptcy events within 1 year of each fiscal year.